A Globally and Superlinearly Convergent Modified BFGS 
Algorithm for Unconstrained Optimization 

Yaguang Yang* 

"(N ! December 27, 2012 

o 

^ ' Abstract 

D ■ 

^**^ ' In this paper, a modified BFGS algorithm is proposed. The modified BFGS matrix estimates a 

modified Hessian matrix proposed in [17] . which is a convex combination of an identity matrix for the 
steepest descent algorithm and a Hessian matrix for Newton's algorithm. The coefficient of the convex 
combination in the modified BFGS algorithm is dynamically chosen in every iteration. It is proved 
that, for any twice differentiable nonlinear function (convex or non-convex), the algorithm is globally 
convergent to a stationary point. If the stationary point is a local minimizer where the Hessian is 
strongly positive definite in a neighborhood of the minimizer, the iterates will eventually enter and 
stay in the neighborhood, and the modified BFGS algorithm reduces to the BFGS algorithm in this 



q 

^ C| neighborhood. Therefore, the modified BFGS algorithm is superlinearly convergent. Moreover, the 



C3 



computational cost of the modified BFGS in each iteration is almost the same as the cost of the BFGS. 
Numerical test on the CUTE test set is reported. The performance of the modified BFGS algorithm 
implemented in our Matlab function mBFGS is compared to the BFGS algorithm implemented in the 
Matlab Optimization Toolbox function fminunc, a limited memory BFGS implemented as L-BFGS, 
a descent conjugate gradient algorithm implemented as CG-Descent 5.3, and a limited memory, 
descent and conjugate algorithm implemented as L-CG-Descent. This result shows that the modified 
. BFGS algorithm may be very effective. 
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x Introduction 

i— i ■ 

The BFGS algorithm is one of the most successful algorithms for unconstrained nonlinear programming 
[5] . Although global and superlinear convergence results have been established for convex problems [T3] , 
it has been proved that, for general problems, the BFGS algorithm with Wolfe line search may not be 
convergent for nonconvex nonlinear functions [4] . Unfortunately, Wolfe line search condition is one of the 
prerequisites for applying the Zoutendijk theorem |18) to prove the global convergence of optimization 
algorithms. This motivates us to find a modified BFGS algorithm that is globally convergent for all 
twice differentiable nonlinear functions, convex or nonconvex. We also would like the behavior of the 
modified BFGS algorithm to be the same as the BFGS algorithm when the iterates approach a minimizer 
where the strong second order condition is met, i.e., we would like the proposed algorithm to be locally 
superlinearly convergent. 

We will first examine how a modified Newton algorithm |17j achieves global and quadratic convergence. 
It uses a modified Hessian matrix which is a convex combination of the Hessian for Newton's algorithm and 
the identity matrix for the steepest descent algorithm. The most obvious advantage of using the convex 
combination other than linear combination of these matrices is that the modified Newton algorithm may 
take the steepest descent iteration or Newton's iteration; it has the merits of both the steepest descent 
algorithm and Newton's algorithm, i.e., it is globally and quadratically convergent. Similar to the idea 
that the BFGS estimates the Hessian matrix, we propose a modified BFGS update that estimates the 
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modified Hessian matrix given in |17j . We will prove that this modified BFGS algorithm is indeed globally 
convergent, as the modified Newton algorithm is. In addition, if the limit point is a local minimizer and 
its Hessian is strongly positive definite in a neighborhood of the minimizer, we will show that the iterates 
will enter the neighborhood and the modified BFGS will reduce to the BFGS, therefore, the modified 
BFGS is locally superlinearly convergent as the BFGS is. 

The proposed modified BFGS update is different from other modified BFGS updates such as [S] and 
[T4] in several aspects. First, our modified BFGS algorithm may take the steepest descent direction, 
while other modified BFGS algorithm may not. Second, the selection of the parameter of the convex 
combination is different from other methods. 

The modified BFGS is implemented in the Matlab function mBFGS. The implementation mBFGS and an 
implementation of BFGS in Matlab Optimization Toolbox f minunc are tested against the CUTE test set 
[T] [3]. The performance of mBFGS is compared to f minunc, and other established and/or state-of-the-art 
optimization software, such as a limited memory BFGS algorithm QT] implemented as L-BFGS, a descent 
conjugate gradient algorithm [5] implemented as CG-Descent 5.3, and a limited memory descent and 
conjugate algorithm [7 implemented as L-CG-Descent. This result shows that the modified BFGS may 
be very effective. 

The remainder of the paper is organized as follows. Section 2 introduces the modified BFGS algorithm. 
Section 3 discusses the algorithm's convergence properties. Section 4 provides the test results. Section 5 
presents the conclusions. 



2 The Modified BFGS Method 

Our objective is to minimize a multi-variable nonlinear (convex or nonconvex) function 

min/(x), (1) 

where / is twice differentiable and x € R". Throughout the paper, we define by g{x) or simply g the 
gradient of f(x), by H(x) or simply H the Hessian of f(x). We denote by H >~ if a matrix H is positive 
definite, by H y if H is positive semidefinite. We will use subscript k for the fcth iteration, hence, xo 
is used to represent the initial point. Denote by x a local minimizer of ([1]), then 

g(x) = 0. (2) 

We make the following assumptions in the convergence analysis. 
Assumptions: 

1. For an open set M. containing the level set C = {x : f(x) < f(xo)}, g(x) is Lipschitz continuous, 
i.e., there exists a constant L > such that 

\\g(x)-g(y)\\<L\\x-y\\, (3) 

for all x, y € M.. 

2. There are positive numbers S > 0, 1 > m > 0, M > 1, and a neighborhood of x, defined by 
Af(x) — {x : f(x) - f(x) < 6}, such that for all x € Af(x) and for all z € R™, 

m\\z\\ 2 < z T H(x)z < M\\z\\ 2 . (4) 

3. There is a positive number L > such that for all x 6 AT(x), 

\\H(x) - H(x)\\ <L\\x-x\\. (5) 

Assumption 1 is required when we use the Zoutendijk theorem to establish the global convergence for the 
modified BFGS algorithm. Assumption 2 indicates that for all x € Af(x), a strong second order sufficient 
condition holds, i.e., there is a unique minimum in the neighborhood Af(x), which will be used to prove 
the global and superlinear convergence, m and M are also used to choose the coefficient of the convex 
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combination. Assumption 3 will be needed only for the proof of the superlinear convergence. The L in 
([3]) may be different from the L in ([5]) . But we can always choose the largest value of L so that ([3]) and 
(|5|) will hold for the same L, which will simplify our notation. 

For most optimization algorithms, the search for the minimizer of ([1} is carried out by using 

Xk+i = Xk + a k d k , (6) 

where a k is the step size, and d k is the search direction. For Newton's method, the search direction d k 
is defined by 

H{x k )d k = -g(x k ), 

and the step size is set to a k = 1. If Newton's method converges, it converges fast (quadratic) but it may 
not converge at all and the computation of H{x k ) is expensive. The BFGS algorithm is developed to 
reduce the cost of the computation of H(x k ) while retaining the feature of fast (superlinear) convergence. 
It estimates H(x k ) using the following update formula 

B k s k sjB k y k yj 

-Dfc+1 — D k ^— h^y , {I) 

s k B k s k y^s k 

where 

s k = Xk+i —x k , yk — g{x k +\) - g(xk). (8) 

On the other hand, since Newton's algorithm may not converge, modihed Newton algorithms are intro- 
duced. In a modified Newton algorithm proposed in [17] . a modified Hessian (j k I + (1 — "l k )H(x k )) is 
suggested and the search for minimizer is carried out along a direction d k that satisfies 

(jkl + (1 - j k )H(x k ))d k = Q(x k )d k = -g(x k ), (9) 

where j k 6 [0, 1] is carefully selected in every iteration. Clearly, the modified Hessian Q(x k ) is a convex 
combination of the identity matrix for the steepest descent algorithm and the Hessian for Newton's 
algorithm. When -f k = 1, the algorithm reduces to the steepest descent algorithm; when j k — 0, the 
algorithm reduces to Newton's algorithm. The global and quadratic convergence for the modified Newton 
algorithm is established in [17] . 

Since the iteration of the modified Newton algorithm is expensive because of the computation of 
H(x k ) and selection of ^ k which involves the computation of the smallest and the largest eigenvalues 
of H(x k ), and the BFGS algorithm may not converge, we propose using a modified BFGS to estimate 
the modified Hessian. We show that the computational cost of the modified BFGS in each iteration is 
roughly the same as computational cost of the BFGS, and the modified BFGS algorithm is globally and 
superlinear ly convergent. 

Note that the BFGS update B k +i, which is an estimation of H(xk+i), is derived from secant equation 

y k = B k+1 s k . (10) 

From ([9]) and (fTO]) . the modified BFGS update E k+ i, which is an estimation of Q(x k +i) = "f k I + (1 — 
jk)Hk+i, is suggested to satisfy 

z k = E k+1 s k = (j k I + (1 - j k )B k+ i)s k = 7 fe s fe + (1 - -f k )y k , (11) 

where ^ k € [0, 1] will be carefully selected in every iteration. If j k — 1, E k+ \ — I estimates Q(x k+ i) and 
the modified BFGS reduces to the steepest descent method from ©. If j k = 0, E k+ i = B k+1 estimate 
Q(x k+ i) = -fffc+i and the modified BFGS reduces to the BFGS method. 

It is straightforward to derive the modified BFGS formula from (fTT]) following exactly the same 
procedures of [T2] pages 197-198], i.e., 

E k s k sjE k z k zj 

-ftfc+i — tLk 1 — r — ■ y lz ) 

s k E k s k z^s k 
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Using the Sherman-Morrison- Woodbury formula [5J page 51], we have 

E^i = (i-^)e^(i-^) + ^. (13) 

Therefore, the modified BFGS search direction is defined by 

E k d k = -g k , (14) 

but the search direction dk is calculated by 

4 = -E^gk, (15) 

where is updated using (fT3j) . To apply the Zoutendijk theorem in the global convergence analysis in 
the next section, dk is desired to be a descent direction, this requires Ek >~ for all k > 0. Assume E >~ 
is selected, and Ek-i >- is obtained. Since dk-i is a descent direction from (fT4|). s^-i = x k — %k-i 7^ 0. 
If we can select 7fc_i such that z^_ 1 Sk-i > 0, then from (fT3|) . it is easy to check that for any O^dG R'\ 
v T E7_ 1 v > 0. Therefore, >- 0, i.e., dk is indeed a descent direction for all k > 0. As a matter of fact, 
we want to select jk to meet the stronger conditions 

m<4^ and ^ < M, (16) 

where m and M satisfy 0<m<l<Af<oo. (fT6|) will be used to prove the global convergence of the 
modified BFGS algorithm. The first inequality of (fl~6f can be rewritten as 

z k s k = (7fcSfc + (1 - lk)yl)s k > msjs k 
lk{slsk-ylsk)>msls k -vls k . 

Denote 

msjsk - y k s k , 7 -. 

Ik = — f ■ (17) 

S k S k -VkSk 

Since j k & [0, 1], the first inequality of (TTTJl) is equivalent to 

max{7 fc ,0} < 7 fe < 1 if sjs k > y^s k , 

0<7fe<l, ifs£ Sfe=y T Sfe) (18) 

< 7fc < 1 < 7fe, if sls k < Vk s k- 

The second inequality of p^|) can be rewritten as 

z fc z fe = (7feS/c + (1 - lk)Vk) T {lkSk + (1 - 7fc)2/fe) < M(lkSk + (1 - lk)yk) T s k (19a) 
P(7fc) = 7fc(«fc - Vk) T (sk - 2/fc) + 7fc(sfe - Vk) T {^yk ~ Ms k ) + yj{yk ~ Ms k ) < 0. (19b) 

p{lk) is a quadratic and convex function of "f k . Since M > 1, it is easy to see that the strict inequality 
of (|19a|) holds for 7^ = 1, hence p(l) < 0. Therefore pi'jk) = has two solutions j k and 7^ satisfying 
7 fe < 1 < 7fc and for any -f k € [7 fc ,7fc], (I19ap holds. From (|19bp . if s fe 7^ y fc , 



__ (^fc-^fc) T (AJ^fc-2afc)-^((sfc-iyfc)' r (Msfc-2t;fc)) 2 -4(s fc ~yfc) T (a li -afc)yJ(yfc-A/sfc) 

^fc ~~ 2(s k -y k ) T (s k -y k ) /^fl") 

(sfc-^) T (Ms fc -2aQ~y7J7(^( Sfc -^ V "J 

2(s t -y fc )' i (s fc - 2 / fc ) ' 

if Sfc — y k , then, the inequality (|19[) holds for 7^ = 0. Therefore, 

l k < 0. (21) 
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Remark 2.1 \1S\) can be replaced by the following equivalent representation 

T T T 

^fe+l = #ft T E k ~ E k z k t H T V'k E k z k) t + ~~ T • ( 22 ) 

ZkZfc ^kZfc ^kZfc z kZfc 2ft Sk 

which requires fewer computational counts than H13\) does. However, this equivalent formula is not nu- 
merically stable. For CUTE problem heart61s, when the condition number of E^,^ is poor, two of the 
eigenvalues of E^\ generated by (22\) are negative, but all eigenvalues of E^\ generated by US)) are 
greater than zeor. 

Remark 2.2 Although the two formulae in i20\) are equivalent, the second one is numerically much more 
stable. For the CUTE test problem djkl, using the first formula results in a negative value inside the 
square root because of the computational error, while the second formula ensures a positive value inside 
the square root. 

Since jf. G [0, 1], therefore 

l k < 7k < 1. (23) 

Intuitively, it is desirable to have 7fc <E [0, 1] as close to zero as possible so that the algorithm will approach 
to the standard BFGS algorithm. Therefore, we want to select the smallest jk £ [0, 1] satisfying (fT8|) 
and (|23|) . We consider all possible relations among msjsk, s^Sk, and yjsk- Since m < 1, we have 
msjsk < sjs k . 

• Case 1 (yjsfe < msjsk < Sfc) : To select the smallest 7^ £ [0, 1] satisfying (fTBf and (|23| . combining 
the first relation of (IT51) and (|2"3")l , we have 7^ = max{max{0, 7^}, j k }. From (|T7|) . we know % > 
in this case. Therefore, 7ft = max {7fc>7 fc }- 

• Case 2 (msjsk < J/Jsfc < sjsh)- To select the smallest jk £ [0, 1] satisfying (fl~8j) and (|23| . combining 
the first relation of (fTBf and (|23|) . we have jk — max{max{0, %}, 7 fe }. From (fl7|) . we know 7^. < 
in this case. Therefore, 7^ = max{0,7 fc }. 

• Case 3 (msjsk < s^Sk < yjsfc): To select the smallest 7^ € [0, 1] satisfying (fTBf and (f23f . combining 
the last 2 relations of (fTBf and (|23l) . we have 7^ = max{0, 7,}. In particular, when Sk = 2/ft, from 
(I2I|), 7 fc < 0, therefore, 7fc = 0. 

Combining all cases, we have 

! m ax{7 fe ,7fc}, if msjsk > y^Sk, 

max{0,7 fe }, if ms£s fc < y^s k , (24) 
0, if s k = yk- 

Remark 2.3 It is worthwhile to note that if y^Vk < MyTsk holds, then M9a\) holds for jj. — 0, i.e., 
7 fc < 0. In addition, if msjsk < y"k s k holds at the same time, then from 7ft = 0. Moreover, dk is 

a descent direction. 

Now we are ready to present the modified BFGS algorithm. 
Algorithm 2.1 Modified BFGS 

Data: < e, m < 1, and 1 < M < 00, initial xq, and Eq = I . 
for k=0,l,2,... 

Calculate gradient g(xk). if||p(xft)|| < e, stop; 

Compute search direction dk using i!5\) : 

Set Xk+i = Xk + Oikdk, where ak satisfies the Wolfe condition to be defined later; 
Compute Sk and yk using |3f); 
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Select 7fc using {21$ , and computer z k using Ml)) : 
Update E^ +l using US\) : 
k <r- k + 1; 

end 

Remark 2.4 It is clear that the computation involving the selection of j k is negligible (requires 0(n) 
operations). Therefore, the cost of modified BFGS in each iteration is almost the same as the cost of the 
BFGS. 

In the rest of the paper, our discussion will focus on the proof of global and superlinear convergence 
of Algorithm 12. II The convergence properties are directly related to the goodness of the search direction 
and step length. The quality of the search direction is measured by 

||5k||||a*|| ||-E*<4||||sfc|| 

where the second equation follows from ((HJ) and . A good step length ctk should satisfy the following 
Wolfe condition. 

f(x k + a k d k ) < f{x k ) + G X oi k gld k , (26a) 
djg(x k + a k d k ) > a 2 g k d k , (26b) 

where < o\ < <j 2 < 1. The existence of the Wolfe condition is established in [K1[T2]. An algorithm 
that finds, in finite steps, a point satisfying the Wolfe condition is given in [10] . Therefore, we will not 
discuss step size selection in this paper. 

3 Convergence Analysis 

An important global convergence result was given by Zoutendijk |18j which can be stated as follows. 

Theorem 3.1 Suppose that f is bounded below in R n and that f is continuously twice differentiable in 
an open set M. containing the level set L — {x : f(x) < f(xo)}. Assume that the gradient is Lipschitz 
continuous on M., i.e., there exists a constant L > such that 

||ff(a:)-5(l/)||<£||a;-»||, (27) 

for all x,y G M.. Assume further that d k is a descent direction and a k satisfies the Wolfe condition. 
Then 

^cos 2 (^)|| 5fc || 2 <oo. (28) 

fe>0 



the Zoutendijk theorem indicates that if d k is a descent direction and cos(8 k ) > S > 0, then the algorithm 
is globally convergent because linife^oo \\g k \\ =0. 

We are now ready to state the main convergence result for the modified BFGS algorithm. 

Theorem 3.2 Suppose that f is bounded below in R n , / is continuously twice differentiable in an open 
set Ai containing the level set C — {x : f(x) < f(xo)}, and Assumptions 1-3 hold. Then Algorithm ic. 1\ 
is globally convergent in the sense that liminf — > 0. In addition, if s k — > 0, then Algorithm \2.1\ 
converges to some x satisfying \\g{x)\\ — with superlinear rate. 
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Proof: First, we show that dk is a descent direction. From the selection of jk using (|24l) . we know that 
(fT6|) holds. The first inequality of (fT6|) guarantees zjsk > 0. Using this fact and (1T31) . we conclude >- 
since Eq = I >- 0. Therefore, is a descent direction. Since (TTB)) holds for all k > 0, following exactly 
the same arguments used in the proof of j!2] Theorem 8.5], we have a subsequence {jk} such that 

cos{9 jk ) > 5 > 0. (29) 

In view of Theorem 13.11 (|2TJf indicates 

liminf -> 0. (30) 

This means that there exists the x and a subsequence of xj k such that x Jfc — > x and = 0. Since 

the function is locally strongly convex in a neighborhood of every local minimizer x with ||g(x)|| = 0, this 
means that every x is isolated (there is a unique local minimizer in the neighborhood). Therefore, the 
condition s& — > and (|3"0|) is enough to prove that Xk — > x. Since Algorithm 12.11 is globally convergent, 
for sufficiently large k, f(xk) < f(x) + 5. Therefore, for all v E R™, 

m\\v\\ 2 <v T H(x k )v < M\\v\\ 2 . (31) 

Using Taylor's Theorem [T^l Theorem 2.1] 

Vk = g{xk+i) - g(xk) = / H(x k + ta k dk)s k dt = H k s k - (32) 



(|3T|) and (|32p imply that //fc is positive definite, i.e., for all v € R", 

™||u|| 2 < u T i? fc w < M\\v\\ 2 . (33) 

This gives 

T T II II ^ II II — ' 

S^Sfc sj.Sk \\Sk\\ \\Sk\\ 

and 

y* f * _ 4H 2 k s k (Hjsk) T - H^sk 



H k ^ < M. 



V k sk s k H k s k \\Hlsk\\ \\Hlsk\\ 

From these two inequalities, in view of Remark l2.3[ we conclude that for large k, jk — is always selected. 
Therefore, the modified BFGS reduces to the standard BFGS for large k. In addition, if Assumption 3 
holds, the BFGS converges at a superlinear rate, therefore the modified BFGS also converges at the rate 
because it is identical to the BFGS for large k. | 



4 Implementation and Numerical Test 

This section provide detailed information about the implementation of the algorithm which is slightly 
different from the description of Algorithm 12. II It also presents our test results against CUTE problems. 

4.1 Implementation details 

Algorithm 12. II has been implemented in Matlab function mBFGS with the following considerations. First, 
the selection of m and M turns out to be important. The m and M of H(x) satisfying @ depend on 
the individual function to be optimized and each of its local minimizers. To be safe, one may select small 
m and large M, which will be likely cover all possible functions to be optimized, but this selection may 
not be numerically stable. On the other hand, if m is selected too big, and/or M is selected too small, 
the selection may violate condition (|4|). Our selection of the default set of parameters are m — 0.00001, 
M = 100000, and e = 0.00001, which, in general, give very impressive computational result. 
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Second, for several test problems, the condition numbers of the estimated E^ 1 are poor at the early 
iterations, which leads to very large vector dk- If this happens, line search takes long time to find a better 
iterate. Therefore, dk is re-scaled such that ||dfc|| = 10 6 if ||c4|| > 10 6 is detected. 

Test result on the algorithm with the above implementation is very impressive. However, for several 
problems, it takes many steps to converge, which may be due to the poor estimation of m and M by 
the default set of the parameters. Therefore, a modification that dynamically selects mk and Mk is 
implemented in mBFGS for the purpose of getting better estimation of the bounds of a particular local 
minimizer of a particular function. 

From (JTTJ) and (|20l) . it is easy to see that m only affects the value of %, and M only affects the value 
of 7 . We also noticed that m and M together affect the condition number of Ek+i, which is important 
to the numerical stability in the computation of dk+i from HH). This requires the ratio of M to m as 
small as possible. On the other hand, we want to select m and M such that 7fc and 7 fc as small as possible 
to maximize the chance of using the BFGS formula. This requires the selection of small m and large M , 
or the selection of the ratio of M to m as large as possible. For the trade off, we select the ratio of M to 
m as 10 10 . The nominal parameters in mBFGS are fh = 0.00001, M = 100000, and e = 0.00001. Because 
of the fixed ratio of M to m, we must increase or decrease m and M at the same time. From (|17l) and 
PU|) . increasing m and M will decrease 7 fe but increase 7fc; and decreasing m and M will increase 7 fc but 
decrease 7fc. From (|24l) . we want the difference of % and 7 fc to be small, so that the final choice 7^ is 
small. Therefore, we dynamically adjust m and M using the following simple heuristics. 

Set m = fh 
M = M 
Calculate jk 
if % > 1 



adjust M = 10 3 M and m = 10 3 m 
then recalculate 7 fe and 7^ 

adjust M = 10 _2 M and m = 10~ 2 m 
then recalculate 7 fe and % 

end 

The above modification significantly reduces the number of iterations for the problems which used 
many iterations if fixed m and M are used. Moreover, it has little impact for the remaining problems. 

4.2 Numerical test 

The modified BFGS implementation mBFGS and the BFGS algorithm implemented in Matlab Optimization 
Toolbox function f minunc are tested using the CUTE test problem set. f minunc options are set as 

options = optimset('LargeScale','off','MaxFunEvals',le+20,'MaxIter',5e+5,'TolFun',le-20, TolX',le-10). 

This setting is selected to ensure that the BFGS implementation fminunc will have enough iterations 
either to converge or to fail. 

We conducted tests for both mBFGS and fminunc against the CUTE test problem set, which is down- 
loaded from the Princeton test problem collections [T|. Since the CUTE test set is presented in AMPL 
mod-files, we first convert AMPL mod-files into nl-files so that Matlab functions can read the CUTE 
models, then we use Matlab functions mBFGS and fminunc to read the nl-files and solve these test prob- 
lems. The objective function is calculated from AMPL command [f,c] = amplfunc(x,0). The gradient 



adjust M = 10 4 Af 

else 



then calculate 7 



calculate 7 fe 



if 7 fc — 7fe > 0.2 and 7 fc > 



elseif 7fe — 7 fe > 0.2 and jk > 



end 
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function is calculated from AMPL command [g,Jac] = amplfunc(x,l). Both mBFGS and fminunc use these 
values in the optimization algorithms. Because of the restriction of the conversion software which con- 
verts mod-files to nl-files, the test is done for all CUTE unconstrained optimization problems whose sizes 
are less than 300. The test uses the initial points provided by the CUTE test problem set, we record 
the calculated objective function values, the norms of the gradients at the final points, and the iteration 
numbers for these tested problems. We present this test results in Table 1. In this table, iter stands for 
the number of total iterations used by the algorithm; obj is the value of the objective function achieved 
at the end of the iterations; gradient is the norm of the gradient at the end of the iterations. 



Table 1: Test result for problems in CUTE [3], initial points are 
given in CUTE 



Prn nlpTTl 

X L KJ L./1XX11 


size 
n 


iter 
mBFCS 

Jill J 1 VJ 


UUJ 

mBFGS 


dU.lCJ.liJ 

mBFGS 


iter 

t m l n 1 1 ti p 

JX1XJX1 LL11V_' 


UUJ 

f m i n 1 1 n p 

JXlllll LIXll—' 


OTiH Pi lPTll" 
gl CliLlldlL 

f m i n 1 1 n p 

lllllllLlllVj 


nrcrlin a 


100 


1 


100.0 


0.0 


4 


100.0 


1 662e-03 


bcLrd 


3 


18 


8214c-02 


2082o-06 


20 


8214p-02 


11 58e-05 

u. i iuuv- yjyj 


Vipo lp 


2 


13 


3075e-14 


181 7p-06 


15 


2400p-09 

yj . £j^.yjyj\j yJu 


1 392e-06 

u.iu jzl yjyj 




6 


70 


1080e-09 


7865e-05 


68 


4796p-03 


1802e-01 


box3 


3 


19 


4077e-1 


n l828e-05 


24 


3880p-1 


2364p-5 


l-J L XYIXIVjX. 


2 


4 


1690r-00 


81 14e-07 


5 


1690p-00 

yj . x yj *j yj \j yjyj 


4542e-06 


n rnwn h 1 

ux yj vv i_loj1 


10 


12 


140fio-12 


6968e-05 


16 


3050p-08 

VJ .yjyJyJyJ\-< VJO 


1043e-03 


hrnwTins 

i_f l yj vv 


2 


632 


1950e-17 


5369e-06 

yj • yjyjyj i/c wo 


11 


9308p-04 


15798 5950e-00 


rYiwnrn n pti 


4 


21 


85822 2016p-00 


31 35p-03 

U.OXOOC yjyj 


32 


85822 201 7p-00 




pn n rnsTi n 


50 


158 


1686e-15 


9838e-05 


98 


30 0583p-00 


10 1863e-00 


cliff 


2 


27 


1997o-00 


n 891 9p-05 


1 


1 0015p-00 




cube 


2 


21 


4231o-19 


6905p-09 

o . \j *j yj yj \J u 


34 


7987p-09 


1 340e-03 


nppfinvi l 

U.LLU11 V LI 


51 


80 


3158p-06 


1 750p-03 

O.X 1 iJUC Vy O 


80 


31 58p-0fi 


1 750e-03 




2 


7 


1467p-13 

V / • X T: 17 1 V. X x* 


3346p-06 


10 


5000p-12 


1581e-05 

u. iuuiv- yjyj 


vXd_LoXXlll U 


2 


7 


6047p-1 3 


5505p-06 


7 


1 000p-1 1 

yj . x u yj yj c x x 


2200e-05 

VJ . jLi £4 yjyjyj yJyJ 


pi pymp nnp 


2 


8 


1833e-00 


4104p-07 

VJ* . j: X V7 Jivi yj 1 


21 


1608p-08 

yj - -L yjyj lj v.. x/u 


3262e-03 


XI X. 1 A O X. 1 J. 1 i. XI 


3 


38 


2461 c-08 


4040p-06 

u.iu iu x yjyj 


23 


45 2971 p-00 


84 5851e-00 


n ptisp nnf 


2 


10 


4325e-17 


n 3919p-07 


10 


2000p-10 

\z . La y / yj y I y ± yj 


1005028p-03 

U. lUUUU^UV v_/o 


ph YPin 3n pi 


10 


15 


5626p-15 


5588p-07 

yj • yj yj yjyj\j yj i 


20 


1400p-1 1 

U. IXUUV. XX 


3661 p-5 


ditl 


2 


79 


-8951 5447e-00 


2881p-01 

UiiiUUlv wx 


3 


-8033 8869p-00 

Ljyj<jKj-yjyjyj\jyj yjyj 


1273 3319e-00 


eigenals 


110 


79 


0.7521e-13 


0.3483e-05 


78 


0.1092e-02 


0.1029e-00 


eigenbls 


110 


513 


0.1994c-10 


0.7166e-05 


91 


0.3462e-00 


0.4642e-00 


engval2 


3 


27 


0.1172c-12 


0.9109e-95 


29 


0.3953e-09 


0.2799e-03 


errinros 


81 


48 


0.2442e-96 


0.5141e-95 


92 


0.4577e-03 


0.2553e-00 


expfit 


2 


11 


0.2405c-00 


0.2350e-05 


12 


0.2405e-05 


0.2263e-05 


extrosnb 


10 


1 


0.0 


0.0 


1 


0.0 


0.0 


fletcbv2 


100 


97 


-0.5140e-00 


0.9676e-05 


98 


-0.5140e-00 


0.1087e-4 


fletchcr 


100 


179 


0.1114c-13 


0.4440e-05 


63 


68.1289e-00 


160.9879e-00 


genhumps 


5 


47 


0.1871c-09 


0.8571e-05 


59 


0.4493e-07 


0.3167e-03 


growthls 


3 


1 


3542.1490e-00 





12 


12.4523p-00 


0.5809e-01 


hairy 


2 


18 


20.0 


0.5710e-05 


22 


20.0 


0.3810e-04 


hatfldd 


3 


23 


0.6615e-07 


0.1853e-05 


19 


0.066150e-07 


0.2355e-05 


hatflde 


3 


28 


0.4434c-06 


0.5321e-05 


9 


0.6210e-06 


0.7970e-05 


heart61s 


6 


2180 


0.2620e-14 


0.6696e-5 


53 


0.6318p-00 


71.9382548e-00 


helix 


3 


22 


0.5489e-16 


0.1901e-06 


29 


0.2260e-10 


0.4196e-04 


hilberta 


10 


20 


0.2422c-06 


0.5563e-05 


35 


0.2289e-06 


0.3263e-05 


hilbertb 


50 


11 


0.4606e-12 


0.3123e-05 


6 


0.21c-ll 


0.6542e-5 


himmelbb 


2 


1 


0.9665e-13 


0.1153e-06 


6 


0.1462e-04 


0.1251e-02 
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himmclbf 


2 


7 


0.1069c-12 


0.9337c-06 


8 


0.1000c-12 


0.1448e-05 


himmelbg 


2 


7 


0.1069e-12 


0.9337e-06 


8 


0.1000e-12 


0.1448e-05 


himmelbh 


2 


7 


-0.9999e-00 


0.5188e-06 


7 


-0.9999e-00 


0.2607e-06 


humps 


2 


85 


0.2281e-09 


0.6755e-05 


25 


5.4248e-00 


2.3625e-00 


jensmp 


2 


1 


2020 





16 


124.3621e-00 


0.2897e-05 


kowosb 


4 


27 


0.3075e-03 


0.1426e-05 


33 


0.3075e-03 


0.1253e-06 


loghairy 


2 


87 


0.1823c-00 


0.6892c-06 


11 


2.5199e-00 


0.5377e-02 


mancino 


100 


35 


0.162c-16 


0.9715e-05 


9 


0.2204c-02 


1.2243e-00 


maratosb 


2 


3 


-1.0 


0.5142e-07 


2 


-0.9997e-00 


0.3570e-01 


mexhat 


2 


1 


-0.4009e-01 


0.6621e-05 


4 


-0.4009e-01 


0.13703e-04 


osborneb 


11 


51 


0.4013e-01 


0.2474e-05 


76 


0.4013e-01 


0.7884e-05 


palmer lc 


8 


32 


0.9759c-00 


0.3995e-06 


38 


16139.4418c-00 


655.0159e-00 


palmer2c 


8 


86 


0.1442c-01 


0.1013e-05 


60 


98.0867c-00 


33.4524e-00 


palmer3c 


8 


47 


0.1953e-01 


0.8975e-05 


56 


54.3139e-00 


7.8518e-00 


palmer4c 


8 


77 


0.5031e-01 


0.6125e-05 


56 


62.2623e-00 


6.6799e-00 


palmer5c 


6 


12 


2.1280c-00 


0.2881c-05 


14 


2.1280c-00 


0.7484e-03 


palmcr6c 


8 


55 


0.1638c-01 


0.4847c-05 


43 


18.0992e-00 


0.7851e-00 


palmer 7c 


8 


40 


0.6019e-00 


0.1289e-05 


28 


56.9098e-00 


4.0268e-00 


palmcr8c 


8 


46 


0.1597c-00 


0.4974e-05 


49 


22.4365c-00 


1.3147e-00 


powellsq 


2 




















rosenbr 


2 


32 


0.1382e-15 


0.5153e-06 


36 


0.283e-10 


2.6095e-5 


sineval 


2 


68 


01271c-15 


0.9558e-06 


47 


0.2212e-00 


1.2315e-00 


sisser 


2 


14 


0.2809e-07 


0.8680e-05 


11 


0.1540e-7 


0.7282671e-5 


tointqor 


50 


37 


1175.4722e-00 


0.7734e-05 


40 


1175.4722e-00 


0.9041e-07 


vardim 


100 


21 


0.0001c-20 


0.1126e-07 


1 


0.2244c-06 


0.5511e-00 


watson 


31 


48 


0.2442e-06 


0.5141e-05 


90 


0.1050e-02 


0.4875e-00 


yfitu 


3 


74 


0.6670c-12 


0.5396e-05 


57 


0.4398e-02 


11.8427e-00 



We summarize the comparison of the test result as follows: 

1. The modified BFGS function mBFGS converges in all the test problems after terminate condition 
HsO^/OII < 10 -5 is met except for three problems brownden, deconvu, and djtl. But for these 
problems, mBFGS finds better solutions than fminunc. Moreover, for about 40% of the problems, 
Matlab Toolbox BFGS function fminunc does not reduce ||<7(xfc)|| to smaller than 0.01. For these 
problems, the objective functions obtained by fminunc normally are not close to the minimum; 

2. For those problems that both mBFGS and fminunc converge, mBFGS most time uses less iterations 
than fminunc and converges to a point with smaller ||g(xfc)||; 

3. There are three problems (denschnc, growthls, and jensmp), mBFGS converges to a local minimum 
but fminunc finds a better point. 

Remark 4.1 We also tried klSfy and \14\) rather than A13\) and M5\) in the calculation of the search 
direction d^. With this implementation, the algorithm converges with \\g(xk)\\ < 10"° as required for all 
the test problems, including brownden, deconvu, and djtl. We noticed that although mBFGS stops before 
\\g(xk)\\ < 10~ 5 is achieved for these three problems, the objective functions obtained are essentially the 
same as if ilty) and are used. Given the fact that using \1S\) and H5\) does not require solving the 
linear systems of equations but using ilty) and p4\ ) does, we suggest using the implementation described 
in this section. 

Most of the above problems are also used, for example in [5], to test some established and state-of- 
the-art algorithms. In [6], 145 CUTEr unconstrained problems are tested against limited memory BFGS 
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algorithm 11 (implemented as L-BFGS), a descent and conjugate gradient algorithm [5] (implemented as 
CG-Descent 5.3), and a limited memory descent and conjugate gradient algorithm [7] (implemented as 
L-CG-Descent). The sizes of most of these test problems are smaller than or equal to 300. The size of 
the largest test problems in 6 is 10000. Since our AMPL converion software does not work for problems 
whose sizes are larger than 300, we consider only problems [6 whose sizes are less than or equal to 300. 
We compare the test results obtained by our implementation of Algorithm 12 . 1 1 and the results obtained by 
algorithms [TT1 [HJ [7] (reported in [6]). For this test, we changed the stopping criterion to ||<7(a;)||oo < 10~ 6 
for consistency. The test results are listed in Table 2. 



Table 2: Comparison of mNewtow, L-CG-Descent, L-BFGS, and 
CG-Descent 5.3 for problems in CUTE [3], initial points are given 
in CUTE 



Problem 


size 


methods 


iter 


nFun 


nGrad 


obj 


gradient 


arglina 


200 


mBFGS 


1 


14 


2 


l.OOOc+002 


1.894e-015 






L-CG-Descent 


1 


3 


2 


2.000e+002 


3.384e-008 






L-BFGS 


1 


3 


2 


2.000e+002 


3.384e-008 






CG-Descent 5.3 


1 


3 


2 


2.000e+002 


2.390e-007 


bard 


3 


mBFGS 


18 


95 


19 


1.157e-001 


9.765e-007 






L-CG-Descent 


16 


33 


17 


8.215e-003 


O r"70 AAA 

3.673e-009 






L-BFGS 


16 


33 


17 


8.215e-003 


3.673e-009 






CG-Descent 5.3 


21 


44 


23 


8.215c-003 


1.912e-007 


beale 


2 


mBFGS 


13 


76 


14 


4.957e-020 


2.979e-010 






L-CG-Descent 


15 


31 


16 


2.727e-015 


A A C\C\ AAO 

4.499e-008 






L-BFGS 


15 


31 


16 


2.727e-015 


4.499e-008 






CG-Descent 5.3 


18 


37 


19 


1.497e-007 


4.297e-007 


biggs6 


6 


mBFGS 


73 


335 


74 


7.777e-013 


4.920e-007 






L-CG-Descent 


27 


57 


31 


5.656e-003 


2.514c-008 






L-BFGS 


27 


57 


31 


5.656e-003 


2.514e-008 






CG-Descent 5.3 


85 


177 


93 


5.656e-003 


9.195c-007 


box3 


3 


mBFGS 


21 


77 


22 


1.692e-016 


4.450e-008 






L-CG-Descent 


11 


24 


13 


3.819e-013 


7.584e-007 






L-BFGS 


11 


24 


13 


3.819e-013 


7.584e-007 






CG-Descent 5.3 


13 


27 


14 


1.707e-010 


6.003e-007 


brkmcc 


2 


mBFGS 


4 


34 


5 


1.690e-001 


8.034e-008 






T < ' 1 ' \ \ , . i- 

L-CU-Descent 


5 


11 


O 


1.690e-001 


O.z20c-008 






L-BFGS 


5 


11 


6 


1.690e-001 


6.220e-008 






CG-Descent 5.3 


4 


9 


5 


1.690e-001 


5.272e-008 


brownbs 


2 


mBFGS 


632 


13543 


633 


1.952e-018 


5.369e-007 






L-CG-Descent 


13 


26 


15 


0.000e+000 


0.000e+000 






L-BFGS 


13 


26 


15 


0.000e+000 


0.000e+000 






CG-Descent 5.3 


16 


40 


33 


1.972e-031 


8.882e-010 


brownden 


4 


mBFGS 


21 


312 


22 


8.582e+004 


3.092e-010 






L-CG-Descent 


16 


31 


19 


8.582e+004 


1.282e-007 






L-BFGS 


16 


31 


19 


8.582e+004 


1.282e-007 






CG-Descent 5.3 


38 


74 


48 


8.582e+004 


9.083e-007 


chnrosnb 


50 


mBFGS 


160 


2185 


161 


1.263e-015 


3.525e-007 






L-CG-Descent 


287 


564 


299 


6.818e-014 


5.414e-007 






L-BFGS 


216 


427 


233 


1.582e-013 


5.565e-007 






CG-Descent 5.3 


287 


564 


299 


6.818e-014 


5.414e-007 


cliff 


2 


mBFGS 


15 


75 


16 


1.998e-001 


7.602e-008 






L-CG-Descent 


18 


70 


54 


1.998e-001 


2.316e-009 






L-BFGS 


18 


70 


54 


1.998e-001 


2.316e-009 






CG-Descent 5.3 


19 


40 


21 


1.998e-001 


6.352e-008 
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cube 


2 


mBFGS 


21 


134 


22 


4.231c-020 


6.845c-010 






L-CG-Descent 


32 


77 


47 


1.269e-017 


1.225c-009 






L-BFGS 


32 


77 


47 


1.269c-017 


1.225c-009 






CG-Descent 5.3 


33 


80 


49 


6.059c-015 


4.697c-008 


deconvu 


61 


mBFGS 


67 


855 


68 


1.567c-009 


9.999e-007 






L-CG-Descent 


475 


951 


476 


1.189e-008 


9.187e-007 






L-BFGS 


208 


417 


209 


2.171c-010 


8.924c-007 






CG-Dcsccnt 5.3 


475 


951 


476 


1.184e-008 


9.078e-007 


dcnschna 


2 


mBFGS 


7 


35 


8 


1.468e-014 


3.198c-007 






L-CG-Descent 


9 


19 


10 


3.167e-016 


3.527e-008 






L-BFGS 


9 


19 


10 


3.167c-016 


3.527c-008 






CG-Descent 5.3 


9 


19 


10 


7.355e-016 


4.825e-008 


denschnb 


2 


mBFGS 


7 


44 


8 


6.048e-014 


4.252e-007 






L-CG-Descent 


7 


15 


8 


3.641e-017 


1.034c-008 






L-BFGS 


7 


15 


8 


3.641c-017 


1.034c-008 






CG-Descent 5.3 


8 


17 


8 


4.702c-014 


4.131c-007 


dcnschnc 


2 


mBFGS 


8 


55 


9 


1.119e-021 


1.731c-010 






L-CG-Descent 


12 


26 


14 


3.253c-019 


3.276e-009 






L-BFGS 


12 


26 


14 


3.253e-019 


3.276e-009 






CG-Descent 5.3 


12 


27 


15 


1.834e-001 


4.143c-007 


denschnd 


3 


mBFGS 


38 


308 


39 


2.461e-009 


3.146c-007 






L-CG-Descent 


47 


98 


51 


4.331e-010 


8.483c-007 






L-BFGS 


47 


98 


51 


4.331e-010 


8.483c-007 






CG-Dcsccnt 5.3 


45 


97 


54 


8.800c-009 


6.115e-007 


dcnschnf 


2 


mBFGS 


10 


75 


11 


4.325c-018 


3.027e-008 






L-CG-Descent 


8 


17 


9 


2.126c-015 


6.455c-007 






L-BFGS 


8 


17 


9 


2.126c-015 


6.455c-007 






CG-Descent 5.3 


11 


24 


13 


1.104c-017 


6.614e-008 


djtl 


2 


mBFGS 


79 


1524 


80 


-8.952e+003 


2.265c-002 






L-CG-Descent 


82 


917 


880 


-8.952e+003 


8.865c-009 






L-BFGS 


82 


917 


880 


-8.952e+003 


8.865e-009 






CG-Descent 5.3 


93 


770 


714 


-8.952e+003 


3.521c-007 


cngval2 


3 


mBFGS 


28 


188 


29 


1.999e-018 


9.405c-008 






L-CG-Descent 


26 


61 


37 


1.034c-016 


8.236c-007 






L-BFGS 


26 


61 


37 


1.034c-016 


8.236e-007 






CG-Dcsccnt 5.3 


76 


161 


88 


3.185c-014 


5.682c-007 


expfit 


2 


mBFGS 


12 


103 


13 


2.405c-001 


2.916c-009 






L-CG-Descent 


13 


29 


16 


2.405c-001 


4.208e-007 






L-BFGS 


13 


29 


16 


2.405e-001 


4.208e-007 






CG-Descent 5.3 


15 


34 


20 


2.405c-001 


1.758e-007 


growthls 


3 


mBFGS 


1 


3 


2 


3.542e+003 


0.000e-999 






L-CG-Descent 


143 


425 


299 


1.004c+000 


3.317c-007 






L-BFGS 


143 


425 


399 


1.004c+000 


3.317c-007 






CG-Dcsccnt 5.3 


441 


997 


596 


1.004c+000 


1.835c-007 


hairy 


2 


mBFGS 


19 


160 


20 


2.000c+001 


6.143c-008 






L-CG-Descent 


36 


99 


65 


2.000c+001 


7.961e-011 






L-BFGS 


36 


99 


65 


2.000c+001 


7.961e-011 






CG-Dcsccnt 5.3 


14 


35 


24 


2.000c+001 


1.044c-007 


hatfldd 


3 


mBFGS 


24 


119 


25 


6.615e-008 


1.107c-007 






L-CG-Descent 


20 


43 


24 


2.547e-007 


1.936c-007 






L-BFGS 


20 


43 


24 


2.547e-007 


1.936c-007 






CG-Dcsccnt 5.3 


40 


98 


61 


6.617e-008 


1.934c-007 


hatflde 


3 


mBFGS 


30 


136 


31 


4.434e-007 


6.576e-007 
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L-CG-Descent 


30 


72 


45 


2.000e+001 


5.012e-007 






L-BFGS 


30 


72 


45 


2.000e+001 


5.012c-007 






CG-Dcsccnt 5.3 


53 


120 


72 


2.000c+001 


5.012c-007 


heart61s 


6 


mBFGS 


2266 


8700 


2267 


2.865c-023 


6.934c-009 






L-CG-Dcsccnt 


684 


1576 


941 


2.646c-010 


5.562c-007 






L-BFGS 


684 


1576 


941 


2.646e-010 


5.562c-007 






CG-Dcsccnt 5.3 


2570 


5841 


3484 


1.305c-010 


2.421c-007 


helix 


3 


mBFGS 


22 


154 


23 


5.489e-017 


1.349c-007 






L-CG-Descent 


23 


49 


27 


1.604e-015 


3.135c-007 






L-BFGS 


23 


49 


27 


1.604c-015 


3.135c-007 






CG-Descent 5.3 


44 


90 


46 


2.427c-013 


6.444c-007 


himnielbb 


2 


mBFGS 


1 


23 


2 


9.665c-014 


8.167e-008 






L-CG-Dcsccnt 


10 


28 


21 


9.294e-013 


2.375c-007 






L-BFGS 


10 


28 


21 


9.294c-013 


2.375c-007 






CG-Dcsccnt 5.3 


11 


23 


12 


1.584c-013 


1.084c-008 


himmelbg 


2 


mBFGS 


7 


45 


8 


1.070c-013 


9.071e-007 






L-CG-Dcsccnt 


8 


20 


13 


9.294c-013 


2.375e-007 






L-BFGS 


8 


20 


13 


9.294c-013 


2.375e-007 






CG-Dcsccnt 5.3 


10 


24 


15 


1.584c-013 


1.084c-008 


himmelbh 


2 


mBFGS 


7 


45 


8 


-l.OOOc+000 


5.026c-007 






L-CG-Descent 


7 


16 


9 


-1.000e+000 


2.892c-011 






L-BFGS 


7 


16 


9 


-1.000e+000 


2.892e-011 






CG-Descent 5.3 


7 


16 


9 


-l.OOOc+000 


1.381c-007 


humps 


2 


mBFGS 


104 


857 


105 


3.280e-016 


6.351c-009 






L-CG-Descent 


53 


165 


120 


3.682e-012 


8.552e-007 






L-BFGS 


53 


165 


120 


3.682e-012 


8.552c-007 






CG-Descent 5.3 


48 


140 


101 


3.916e-012 


8.774e-007 


jensmp 


2 


mBFGS 


1 


3 


2 


2.020e+003 


0.000e-999 






L-CG-Dcsccnt 


15 


33 


22 


1.244e+002 


5.302c-010 






L-BFGS 


15 


33 


22 


1.244e+002 


5.302c-010 






CG-Dcsccnt 5.3 


13 


29 


19 


1.244c+002 


4.206c-009 


kowosb 


4 


mBFGS 


28 


147 


29 


3.075c-004 


1.367e-007 






L-CG-Dcsccnt 


17 


39 


23 


3.078c-004 


3.704c-007 






L-BFGS 


17 


39 


23 


3.078c-004 


3.704e-007 






CG-Dcsccnt 5.3 


66 


139 


76 


3.078c-004 


8.818c-007 


loghairv 


2 


mBFGS 


74 


882 


75 


1.823c-001 


5.904c-007 






L-CG-Dcsccnt 


27 


81 


58 


1.823c-001 


1.762c-007 






L-BFGS 


27 


81 


58 


1.823c-001 


1.762c-007 






CG-Dcsccnt 5.3 


46 


136 


97 


1.823e-001 


7.562c-008 


mancino 


100 


mBFGS 


37 


1202 


38 


1.548e-020 


1.414c-007 






L-CG-Dcsccnt 


11 


23 


12 


9.245c-021 


7.239e-008 






L-BFGS 


9 


19 


30 


3.048c-021 


1.576c-007 






CG-Dcsccnt 5.3 


11 


23 


12 


9.245c-021 


7.239c-008 


maratosb 


2 


mBFGS 


3 


59 


4 


-l.OOOc+000 


5.142c-008 






L-CG-Descent 


1145 


3657 


2779 


-1.000e+000 


3.216c-007 






L-BFGS 


1145 


3657 


2779 


-l.OOOc+000 


3.216c-007 






CG-Dcsccnt 5.3 


946 


2911 


2191 


-l.OOOc+000 


3.230c-009 


mcxhat 


2 


mBFGS 


5 


32 


6 


-4.010c-002 


1.426c-012 






L-CG-Descent 


20 


56 


39 


-4.001c-002 


4.934c-009 






L-BFGS 


20 


56 


39 


-4.001c-002 


4.934e-009 






CG-Dcscent 5.3 


27 


61 


36 


-4.001e-002 


3.014c-007 


osborncb 


11 


mBFGS 


53 


377 


54 


4.014c-002 


2.480c-007 






L-CG-Descent 


62 


127 


65 


4.014c-002 


4.427c-007 



13 







L-BFGS 


62 


127 


65 


4.014e-002 


4.427c-007 






CG-Dcsccnt 5.3 


214 


423 


219 


4.014c-002 


7.485c-007 


palmer lc 


8 


mBFGS 


32 


211 


33 


9.760e-002 


3.935e-007 






L-CG-Dcsccnt 


11 


26 


26 


9.761e-002 


1.254c-009 






L-BFGS 


11 


26 


26 


9.761e-002 


1.254c-009 






CG-Dcsccnt 5.3 


126827 


224532 


378489 


9.761c-002 


9.545c-007 


palmer2c 


8 


mBFGS 


112 


446 


113 


1.442e-002 


8.296e-007 






L-CG-Descent 


11 


21 


21 


1.437e-002 


1.257e-008 






L-BFGS 


11 


21 


21 


1.437e-002 


1.257c-008 






CG-Dcsccnt 5.3 


21362 


21455 


42837 


1.437c-002 


5.761c-007 


palmcr3c 


8 


mBFGS 


47 


245 


48 


1.954c-002 


2.050c-008 






L-CG-Descent 


11 


20 


20 


1.954e-002 


1.754e-010 






L-BFGS 


11 


20 


20 


1.954e-002 


1.754c-010 






CG-Descent 5.3 


5536 


5777 


11379 


1.954c-002 


9.753e-007 


palmer4c 


8 


mBFGS 


78 


351 


79 


5.031e-002 


2.235c-007 






L-CG-Descent 


11 


20 


20 


5.031e-002 


3.928e-009 






L-BFGS 


11 


20 


20 


5.031e-002 


3.928e-009 






CG-Dcsccnt 5.3 


44211 


49913 


96429 


5.031e-002 


9.657e-007 


palmcr5c 


6 


mBFGS 


13 


157 


14 


2.128c+000 


4.810e-009 






L-CG-Descent 


6 


13 


7 


2.128e+000 


3.749c-012 






L-BFGS 


6 


13 


7 


2.128e+000 


3.749c-012 






CG-Dcsccnt 5.3 


6 


13 


7 


2.128c+000 


2.629e-009 


palmer6c 


8 


mBFGS 


56 


243 


57 


1.639e-002 


6.900e-007 






L-CG-Descent 


11 


24 


24 


1.639e-002 


5.520e-009 






L-BFGS 


11 


24 


24 


1.639e-002 


5.520e-009 






CG-Dcsccnt 5.3 


14174 


142228 


28411 


1.639e-002 


7.738c-007 


palmer 7c 


8 


mBFGS 


41 


212 


42 


6.020c-001 


5.201e-007 






L-CG-Dcsccnt 


11 


20 


20 


6.020c-001 


7.132e-009 






L-BFGS 


11 


20 


20 


6.020c-001 


7.132c-009 






CG-Dcsccnt 5.3 


65294 


78428 


149585 


6.020e-001 


9.957c-007 


palmer8c 


8 


mBFGS 


48 


361 


49 


1.598c-001 


1.099c-009 






L-CG-Dcsccnt 


11 


18 


17 


1.598c-001 


2.376e-009 






L-BFGS 


11 


18 


17 


1.598c-001 


2.376e-009 






CG-Dcsccnt 5.3 


8935 


9903 


19183 


1.598c-001 


9.394e-007 


rosenbr 


2 


mBFGS 


32 


241 


33 


1.383e-016 


4.603e-007 






L-CG-Dcsccnt 


34 


77 


44 


4.691c-018 


7.167c-008 






L-BFGS 


34 


77 


44 


4.691c-018 


7.167e-008 






CG-Dcsccnt 5.3 


37 


86 


52 


1.004c-014 


1.894c-007 


sineval 


2 


mBFGS 


69 


489 


70 


1.910c-019 


1.168c-008 






L-CG-Descent 


60 


143 


87 


1.556e-023 


1.817c-011 






L-BFGS 


60 


143 


87 


1.556e-023 


1.817e-011 






CG-Dcsccnt 5.3 


62 


172 


122 


1.023c-012 


5.575c-007 


sisser 


2 


mBFGS 


19 


83 


20 


3.860c-010 


4.587c-007 






L-CG-Descent 


6 


18 


14 


6.830e-012 


2.220e-008 






L-BFGS 


6 


18 


14 


6.830e-012 


2.220e-008 






CG-Dcsccnt 5.3 


6 


13 


7 


3.026e-014 


3.663e-010 


tointqor 


50 


mBFGS 


39 


615 


40 


1.176c+003 


4.033c-007 






L-CG-Descent 


29 


36 


53 


1.175c+003 


4.467c-007 






L-BFGS 


28 


35 


51 


1.175c+003 


7.482c-007 






CG-Dcsccnt 5.3 


29 


36 


53 


1.175c+003 


4.464c-007 


vardim 


200 


mBFGS 


22 


154 


23 


1.237c-021 


1.376c-009 






L-CG-Descent 


10 


21 


11 


4.168e-019 


2.582c-007 






L-BFGS 


7 


31 


27 


5.890e-025 


3.070c-010 



14 







CG-Dcscent 5.3 


10 


21 


11 


4.168e-019 


2.582e-007 


watson 


12 


mBFGS 


61 


308 


62 


1.130c-008 


3.081e-007 






L-CG-Descent 


49 


102 


54 


1.592c-007 


8.026e-007 






L-BFGS 


48 


97 


49 


9.340e-008 


1.319c-007 






CG-Dcsccnt 5.3 


726 


145 


727 


1.139c-007 


8.115e-007 


yfitu 


2 


mBFGS 


73 


462 


74 


6.670c-013 


1.938e-007 






L-CG-Descent 


75 


177 


106 


8.074c-010 


3.910e-007 






L-BFGS 


75 


177 


106 


8.074c-010 


3.910e-007 






CG-Dcscent 5.3 


147 


327 


189 


2.969e-011 


5.681e-007 



We summarize the comparison of the test results as follows: 

1. For two problems (arglina and biggs6), mBFGS converges to better points than L-CG-Descent, 
L-BFGS, and CG-Descent 5.3. For another 2 problems (growthls and jensmp), L-CG-Descent, 
L-BFGS, and CG-Descent 5.3 converge to better points. 

2. For 19 problems, mBFGS converges faster than L-CG-Descent, L-BFGS, and CG-Descent 5.3. For 
about 10 problems, mBFGS converges slower than L-CG-Descent, L-BFGS, and CG-Descent 5 . 3. For 

the rest problems, mBFGS converges either faster than some but slower than other codes or as faster 
as all other codes. 

Based on these numerical test results, we believe that the proposed algorithm is very promising. 

5 Conclusions 

We have proposed a modified BFGS algorithm and proved that the modified BFGS algorithm is globally 
and superlinearly convergent. We have shown that the computational cost in each iteration is almost the 
same for both the BFGS and the modified BFGS. We have provided numerical test results and compared 
the performance of the modified BFGS to the performance of implementations of other established and 
state-of-the-art algorithms, such as BFGS, limited memory BFGS, descent and conjugate gradient, and 
limited memory descent and conjugate gradient. The results and comparison show that the modified 
BFGS algorithm appears very effective. 

6 Acknowledgements 

The author would like to thank Professor Jorge Nocedal who suggested conducting the test for the 
modified BFGS and BFGS against the CUTE test problems, and comparing the performance. The 
author also thanks Dr. Sven Leyffer at Atgonne National Laboratory for making him aware of Professor 
W. Hager's website which has test results of L-CG-Descent, L-BFGS, and CG-Descent 5.3. The author 
is grateful to Professor Teresa Monterio at Universidade do Minho for providing the software to convert 
AMPL mod-files into nl-files, which makes the test possible. 

References 

[1] http:/ /www. orfe.princeton.edu/ rvdb/ampl/nlmodcls/index.html. 

[2] D.P. Bertsekas (1996), Nonlinear Programming, Belmont, Massachusetts: Athena Scientific. 

[3] I Bongartz and A R. Conn and N. Gould and P.L. Toint (1995), CUTE: Constrained and Uncon- 
strained Testing Environment, A CM Transactions on Mathematical Software, Vol. 21, pp. 123-160. 

[4] Yu-Hong Dai (2002), Convergence properties of the BFGS Algorithm, SI AM Journal of optimization, 
Vol. 13, pp. 693-701. 



15 



[5] G. H. Golub and C. F. Van Loan (1989), Matrix Computations, Baltimore: The Johns Hopkins 
University press. 

[6] W. W. Hager and H. Zhang, http://www.math.ufl.edu/ hager/CG/results6.0.txt. 

[7] W. W. Hager and H. Zhang, The limited memory conjugate gradient method, 
http://www.math.ufl.edu/ hager/CG/results6.0.txt. 

[8] W. W. Hager and H. Zhang (2005), A new conjugate gradient method with guaranteed descent and 
an efficient line search, SIAM J. Optimization, Vol. 16, pp. 170-192. 

[9] Dong-Hui Li and Masao Fukushima (2001), A modified BFGS method and its global convergence in 
nonconvex minimization, Journal of Computational and Applied Mathematics, Vol. 19, pp. 15-35. 

[10] J. More and D. J. Thuente (1990), On line search algorithms with guaranteed sufficient decrease, 
Technical Report MCS-P153-0590, Mathematics and Computer Science Division, Argonne National 
Laboratory, Argonne, IL. 

[11] J. Nocedal (1980), Updating quasi-Newton matrices with limited storage, Math. Comp., Vol. 35, pp. 
773-782. 

[12] J. Nocedal and S.J. Wright (1993), Numerical Optimization, New York: Springer- Verlag. 

[13] M. J. D. Powell (1976), Some global convergence properties of a variable metric algorithm for mini- 
mization without exact line searches, In R. W. Cottle and C. E. Lemke, (Eds.) , SIAM-AMS Pro- 
ceedings Vol. IX, Philadelphia, PA: SIAM, pp. 53-72. 

[14] Z. Wei and G. Li and L. Qi (2006), New quasi-Newton methods for unconstrained optimization 
problems, Applied Mathematics and Computation, 175, pp. 1156-1188. 

[15] P. Wolfe (1969), Convergence conditions for ascent methods, SIAM Review, Vol. 11, pp. 226-235. 

[16] P. Wolfe (1971), Convergence conditions for ascent methods II: Some Corrections, SIAM Review, 
Vol. 13, pp. 185-188. 

[17] Y. Yang (2012), A Globally and Quadratically Convergent Algorithm for Nonconvex Unconstrained 
Optimization, \arXiv:121275452\ [math. OCJ. 

[18] G. Zoutendijk (1970), Nonlinear Programming, Computational Methods, In J. Abadie, (Eds.) , 
Integer and Nonlinear Programming, North Holland: Amsterdam, pp. 37-86. 



16 



